source("utils.R")POC modeled VS POC observed
Investigate whether modeled POC is representative of POC observations.
POC observation data
Read data
Read data, keep columns of interest, rename, drop observations where poc is missing, and keep only sediment traps.
df_raw <- read_delim("data/raw/GO_flux.tab", delim = "\t", escape_double = FALSE, trim_ws = TRUE, skip = 87, show_col_types = FALSE)
# Select columns of interest and rename
df <- df_raw %>%
select(
id_ref = `ID (Reference identifier)`,
id_loc = `ID (Unique location identifier)`,
type = `Type (Data type)`,
lon = Longitude,
lat = Latitude,
depth = `Depth water [m] (Sediment trap deployment depth)`,
start_date = `Date/Time (Deployed)`,
end_date = `Date/time end (Retrieved)`,
duration = `Duration [days]`,
poc = `POC flux [mg/m**2/day]`
) %>%
# drop observations where poc is missing
drop_na(poc) %>%
# only sediment traps
filter(type == "sediment_trap")Distribution in space and time
Number of observations at each location.
df %>%
add_count(lon, lat) %>%
ggplot() +
geom_polygon(data = world, aes(x = lon, y = lat, group = group), fill = "grey") +
geom_point(aes(x = lon, y = lat, colour = n)) +
#scale_colour_cmocean(name = "deep") +
scale_colour_viridis_c(trans = "log10") +
coord_quickmap(expand = 0)Depth distribution
ggplot(df) + geom_histogram(aes(x = depth), binwidth = 100)Time distribution
ggplot(df) + geom_histogram(aes(x = start_date))ggplot(df) + geom_histogram(aes(x = duration)) + scale_x_log10()Data points around the 1000 m horizon
Let’s focus on the data around 1000 m depth. For a set of depth range between 50 and 500 m, compute number and map observations.
hor <- 1000
ranges <- lapply(seq(from = 100, to = 500, by = 50), function(W){
d <- df %>%
filter(between(depth, hor - W, hor + W)) %>% # keep points within the depth range
mutate(W = W)
return(d)
})
ranges <- do.call(bind_rows, ranges)
ranges %>%
count(W) %>%
ggplot() +
geom_col(aes(x = W, y = n)) +
ggtitle("Number of data points within 1000 ± W depth range")ranges %>%
ggplot() +
geom_polygon(data = world, aes(x = lon, y = lat, group = group), fill = "grey") +
geom_point(aes(x = lon, y = lat, colour = depth)) +
scale_colour_cmocean(name = "deep") +
coord_quickmap(expand = 0) +
facet_wrap(~W)Fixed depth range
Let’s use a depth range of 200 meters. Look at the distribution of POC values (transformed and logged).
W <- 500
df <- df %>%
filter(between(depth, hor - W, hor + W)) %>%
mutate(poc_log = log(poc))
ggplot(df) + geom_histogram(aes(x = poc))ggplot(df) + geom_histogram(aes(x = poc_log)) # yayLog-transformed poc is closer to a normal distribution, but if we want to predict modeled poc from observed poc, depending on the prediction model, the distribution of the predictor may not be important.
Let’s now round coordinates to match with Wang et al., 2023 which uses a grid of 2°×2°.
# average by pixel
df_pix <- df %>%
mutate(
# floor longitude and add 1 because carbon longitudes are odd
lon = roundp(lon, precision = 2, f = floor) + 1,
# round latitude because carbon latitudes are even
lat = roundp(lat, precision = 2, f = round)
) %>%
group_by(lon, lat) %>%
summarise(
poc_mean = mean(poc),
poc_sd = sd(poc)
) %>%
ungroup()
df_pix %>%
ggplot() +
geom_polygon(data = world, aes(x = lon, y = lat, group = group), fill = "grey") +
geom_point(aes(x = lon, y = lat, colour = poc_mean)) +
scale_colour_cmocean(name = "matter") +
coord_quickmap(expand = 0)df_pix %>%
ggplot() +
geom_polygon(data = world, aes(x = lon, y = lat, group = group), fill = "grey") +
geom_point(aes(x = lon, y = lat, colour = poc_sd)) +
scale_colour_viridis_c() +
coord_quickmap(expand = 0)Let’s check the number of observations per pixel.
df %>%
mutate(
# floor longitude and add 1 because carbon longitudes are odd
lon = roundp(lon, precision = 2, f = floor) + 1,
# round latitude because carbon latitudes are even
lat = roundp(lat, precision = 2, f = round)
) %>%
count(lon, lat) %>%
ggplot() +
geom_polygon(data = world, aes(x = lon, y = lat, group = group), fill = "grey") +
geom_point(aes(x = lon, y = lat, colour = n)) +
scale_colour_viridis_c(trans = "log10") +
coord_quickmap(expand = 0)Looks like we have 2 timeseries with > 300 observations.
Match with POC modeled data
Read modeled data
load("data/00.carbon_data.Rdata")
df_c <- df_c %>%
# Rename poc
rename(poc_mod = poc) %>%
# Convert from mmol C m-2 year-1 to mg C m-2 day-1 (divide by 365.25 and multiply by 12)
mutate(poc_mod = (poc_mod / 365.25)*12)
ggmap(df_c, var = "poc_mod", type = "raster") +
scale_fill_cmocean(name = "matter", na.value = NA) +
geom_point(data = df_pix, aes(x = lon, y = lat), size = 0.5)Join
Join by coordinates, drop observations where modeled poc is not available.
df_pix <- df_pix %>%
left_join(df_c, by = join_by(lon, lat)) %>%
drop_na(poc_mod)Let’s plot a map of our final dataset.
ggplot(df_pix) +
geom_polygon(data = world, aes(x = lon, y = lat, group = group), fill = "grey") +
geom_point(aes(x = lon, y = lat)) +
coord_quickmap(expand = 0)Let’s also have a look at the data distribution.
ggplot(df_pix) + geom_histogram(aes(x = poc_mean))ggplot(df_pix) + geom_histogram(aes(x = poc_mod))Let’s still plot modeled VS observations, both in untransformed and log-transformed spaces.
ggplot(df_pix) +
geom_point(aes(x = poc_mean, y = poc_mod)) +
geom_abline(intercept = 0, slope = 1, colour = "red")ggplot(df_pix) +
geom_point(aes(x = poc_mean, y = poc_mod)) +
geom_abline(intercept = 0, slope = 1, colour = "red") +
scale_x_log10() +
scale_y_log10()Let’s compute the correlation.
# Untransformed
cor(df_pix$poc_mean, df_pix$poc_mod)[1] 0.1169209
# Log-transformed
cor(log(df_pix$poc_mean), log(df_pix$poc_mod))[1] 0.2019889